Scaling Pyramid
Scaling an analytics platform can be complicated due to the many variables and factors that affect the performance of the “analytics stack”. These factors include hardware choices; networking; the host operating system; the client browser; the underlying data stack technologies; and the data footprint itself. Other key variables relate to activity volumes - which is a combination of concurrent user patterns; typical query sizes / complexity; and content design.
Owing to this complexity, there is no set formula that can be easily applied to every deployment scenario, beyond the technical steps needed to create a scaled-out instance. Instead, there are a variety of best practices and ideas on how to create scale and ways to measure what factors in the stack need attention. The good news is that Pyramid was designed to scale up and out to meet those needs. However, you will need to consider other aspects of your deployment in arriving at the right solution.
Scaling Guide
- This Scaling Guide is designed to help you come to the right decisions.
Scaling FAQ
How is the platform scaling and parallelizing a single task (as opposed to running multiple tasks in parallel)?
Interactive query operation parallelization is performed by sharding a single query into its sub parts and running them in parallel. The PQL engine stitches the responses back together for a final response to the client. The engine is designed to handle multiple concurrent queries. In a multi-node cluster, each query will be sent to different RTE engines. But a single RTE will handle all the aspects for a given query.
Task operations are spread across all TE engines in the cluster. So a given execution may contain multiple tasks, that may each be handled by different servers. Each TE engine handles one single task job.
- For ETL, the data ingestion and processing is done at the row level, in parallel. So multiple rows will be read in parallel and processed individually. This extends to parallel reading of source tables and streaming the cleansed data in parallel to the destination.
- For publications, parallelization is performed by dividing each execution job into multiple sub-tasks and running them in parallel
Is this scaling dynamic or manual? In case of multiple concurrent tasks, how is the allocation performed?
Admins can control two main settings to influence the scaling – which is handled automatically.
- For batch engine tasks, admins can control the limits of concurrency with peak/off peak parallelization settings to ensure real-time resources are not overwhelmed during normal business hours.
- Load balancing across the system can be set to round robin or performance based. For performance based LB, the system dynamically allocates jobs based on the available resources (CPU, memory) of the machine running the task.
Can a task be prioritized (assigned more resources dynamically) once submitted?
Task prioritization is handled internally based on a well-honed decision algorithm. This cannot be adjusted. Priority is given to real-time querying, followed by user-initiated printing and ETL previews, followed by scheduled events like publications, ETL and alerts. Priority between batch and real-time events is mostly solved by splitting the RTE services and TE services to separate servers. Its also possible to split TE services for data ETL processing from publication / print processing for added control of resources.